Rank Consistent Estimation: The DOP Case
نویسنده
چکیده
The goal of an estimator is to approximate the unknown distribution of the language from its partial evidence. In this thesis, a rank consistent estimator is defined as an estimator that preserves the ranking frequencies of all the full parse trees in the treebank proved to be rank consistent with respect to the training treebank. The rank consistency property adopts Laplace’s Principle of Insufficient Reason for statistical parsing: a rank consistent estimator assigns the same probability to all trees that occur the same number of times in the training data. This thesis presents the first non-trivial DOP estimator where the treebank is not only considered as a stochastic generating system but also a sample of the stochastic process. In this thesis, the existing DOP definitions of probability and derivation of full parse trees are generalized to subtrees. Fragments in the treebank’s fragment corpus are assigned weights so that their probabilities are proportional to their relative frequencies. The estimator is proved to be rank consistent. The theoretical property of the model is substantiated by empirical results. The new estimator outperforms the DOP1 estimator on the OVIS corpus.
منابع مشابه
A Consistent and Efficient Estimator for Data-Oriented Parsing
Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One crucial property of a ‘good’ estimator is that its guess approaches the unknown distribution as the sample sequence grows large. This property is called consistency. This paper concerns estimators f...
متن کاملA Consistent and Efficient Estimator for the Data-oriented Parsing Model
Given a sequence of samples from an unknown probability distribution, a statistical estimator aims at providing an approximate guess of the distribution by utilizing statistics from the samples. One desired property of an estimator is that its guess approaches the unknown distribution as the sample sequence grows large. Mathematically speaking, this property is called consistency. This thesis p...
متن کاملStructured Parameter Estimation for LFG-DOP using Backoff
Despite its state-of-the-art performance, the Data Oriented Parsing (DOP) model has been shown to suffer from biased parameter estimation, and the good performance seems more the result of ad hoc adjustments than correct probabilistic generalization over the data. In recent work, we developed a new estimation procedure, called Backoff Estimation, for DOP models that are based on Phrase-Structur...
متن کاملImproved Wi-Fi AP position estimation using regression based approach
This paper describes the improved Wi-Fi AP position estimation method for building more accurate Wi-Fi AP position DB in complex indoor signal propagation environment. One of our pro Fi AP by using indoor survey for higher Wi-Fi AP position accuracy. Our contribution focuses on the Wi-Fi AP position estimation method. In previous works, there were several methods for Wi-Fi AP position estimatio...
متن کاملOn the Statistical Consistency of DOP Estimators
A statistical estimator attempts to guess an unknown probability distribution by analyzing a sample from this distribution. One desirable property of an estimator is that its guess is increasingly likely to get arbitrarily close to the actual distribution as the sample size increases. This property is called consistency. Data Oriented Parsing (DOP) employs all fragments of the trees in a traini...
متن کامل